Optimizing access to page table entries in processor-based devices

ABSTRACT

Optimizing access to page table entries in processor-based devices is disclosed. In this regard, an instruction decode stage of an execution pipeline of a processor-based device receives a memory access instruction including a virtual memory address. A page table walker circuit of the processor-based device determines, based on the memory access instruction, a number T of page table walk levels to traverse, where T is greater than zero (0) and less than or equal to a number of page table walk levels required to fully translate the virtual memory address. The page table walker next performs a page table walk of T page table walk levels of the multilevel page table, and identifies a physical memory address corresponding to a page table entry of the T th  page table walk level. The processor-based device then performs a memory access operation indicated by the memory access instruction using the physical memory address.

FIELD OF THE DISCLOSURE

The technology of the disclosure relates to accessing page tablesentries in processor-based devices and, more particularly, to accessingpage table entries in multilevel page tables.

BACKGROUND

Page tables are data structures used by modern processor-based devicesto provide virtual memory functionality. A page table provides pagetable entries that store mappings between virtual memory addresses andcorresponding physical memory addresses (i.e., addresses of memorylocations in a system memory). When a processor-based device needs totranslate a virtual memory address into a physical memory address, theprocessor-based device accesses the page table to locate the page tableentry associated with the virtual memory address, and then reads thecorresponding physical memory address from the page table entry.Recently accessed mappings may also be cached by the processor-baseddevice in a translation lookaside buffer (TLB) for subsequent reusewithout the need to repeat the translation process. By using page tablesto implement virtual memory functionality, the processor-based deviceenables software processes to access secure memory spaces that areisolated from one another, and that together may be conceptually largerthan the available physical memory.

A multilevel page table is a page table variant that makes use ofmultiple page tables organized into a hierarchical data structure. Totranslate a virtual memory address using a multilevel page table, ahardware element provided by the processor-based device and known as a“page table walker” performs a “page table walk.” In the first step ofthe page table walk, the page table walker uses a base address pointingto the highest-level page table in the multilevel page table, andapplies the topmost set of bits of the virtual memory address as anindex to access a page table entry in the highest-level page table. Thatpage table entry provides a pointer to a next-lower page table, whichthe page table walker uses in combination with a next-lower set of bitsof the virtual memory address to access a page table entry in thenext-lower page table. The page table entry in the next-lower page tablecontains a pointer to another next-lower page table, and so on. The pagetable at the lowest level of the multilevel page table provides apointer to a physical memory page, which is used in combination with thebottommost set of bits of the virtual memory address to determine thephysical memory address corresponding to the virtual memory address.

In some scenarios, it may be desirable to enable a software process tomodify the contents of the page tables themselves (e.g., to update aphysical memory address stored in a page table entry of a page table, orto modify access permissions on a corresponding memory page, asnon-limiting examples). To do so, the software process seeking to modifythe page table entry must first acquire the physical memory address ofthat page table entry within the system memory. One approach toobtaining the physical memory address of a page table entry involvesrecursive mapping of each page table of a multilevel page table, suchthat the last page table entry (i.e., the page table entry with thehighest index associated with the virtual memory address) of each pagetable stores a pointer to that page table. Before the software processexecutes a memory access operation on the page table entry, the softwareprocess first performs a right bit shift on the virtual memory address,and populates the upper bits of the virtual memory address with ones(1s). The page table walker then performs a conventional page table walkusing the shifted virtual memory address. Because the topmost set ofbits of the virtual memory address that are used as an index into thehighest-level page table are all ones (1s), the page table entry that isaccessed first by the page table walker is the last page table entry inthe highest-level page table, which merely points back to thehighest-level page table. As a result, the page table walker, which isconventionally configured to traverse a specified number of levels ofthe multilevel page table when performing a page table walk, will endits page table walk one level “early,” and will return the physicalmemory address of the page table entry in the lowest-level page tableinstead of a physical memory address in the system memory.

While recursive mapping does provide a solution that allows a memoryaccess instruction to access the page table entry itself, this approachdoes have disadvantages. In particular, recursive mapping requires adedicated page table entry in each page table of the multilevel pagetable, which reduces the number of page table entries available foraddress translation. Additionally, while the recursive page tableaccesses performed by the page table walker may be cached in a TLB, thecached recursive mappings are usable only for subsequent recursivemappings, which can result in decreased efficiency.

Accordingly, a more efficient mechanism for obtaining physical memoryaddresses for page table entries in a multilevel page table isdesirable.

SUMMARY

Exemplary embodiments disclosed herein include optimizing access to pagetable entries in processor-based devices. In this regard, in oneexemplary embodiment, an instruction decode stage of an executionpipeline of a processor-based device receives a memory accessinstruction (e.g., a memory load instruction, a memory storeinstruction, or a memory read/modify/write instruction, as non-limitingexamples) that includes a virtual memory address. A page table walkercircuit of the processor-based device determines, based on the memoryaccess instruction, a number T of page table walk levels to traverse,where T is greater than zero (0) and less than or equal to a number ofpage table walk levels required to fully translate the virtual memoryaddress. In some embodiments, the memory access instruction may providea traverse indicator that explicitly specifies the number T of pagetable walk levels to traverse, while some embodiments may provide thatthe number T of page table walk levels to traverse may be determinedbased on a count of recursive traversals indicated by the virtual memoryaddress. The page table walker next performs a page table walk of T pagetable walk levels of the multilevel page table, and identifies aphysical memory address corresponding to a page table entry of the Tthpage table walk level. The processor-based device then performs a memoryaccess operation indicated by the memory access instruction using thephysical memory address.

In another exemplary embodiment, a processor-based device is provided.The processor-based device includes a system memory that comprises amultilevel page table made up of a plurality of page tables, each pagetable comprising a plurality of page table entries. The processor-baseddevice further includes a processing element (PE) that comprises anexecution pipeline comprising an instruction decode stage, and a pagetable walker circuit. The PE is configured to receive, using theinstruction decode stage, a memory access instruction comprising avirtual memory address. The PE is further configured to determine, usingthe page table walker circuit based on the memory access instruction, anumber T of page table walk levels to traverse, wherein T is greaterthan zero (0) and less than or equal to a number of page table walklevels required to fully translate the virtual memory address. The PE isalso configured to perform, using the page table walker circuit based onthe virtual memory address, a page table walk of T page table walklevels of the multilevel page table. The PE is additionally configuredto identify, based on the page table walk, a physical memory addresscorresponding to a page table entry of the Tth page table walk level.The PE is further configured to perform a memory access operationindicated by the memory access instruction using the physical memoryaddress.

In another exemplary embodiment, a method for optimizing access to pagetable entries is provided. The method comprises receiving, by aninstruction decode stage of an execution pipeline of a processingelement (PE) of a processor-based device, a memory access instructioncomprising a virtual memory address. The method further comprisesdetermining, by a page table walker circuit of the PE based on thememory access instruction, a number T of page table walk levels totraverse, wherein T is greater than zero (0) and less than or equal to anumber of page table walk levels required to fully translate the virtualmemory address. The method also comprises performing, by the page tablewalker circuit of the PE based on the virtual memory address, a pagetable walk of T page table walk levels of a multilevel page table. Themethod additionally comprises identifying, based on the page table walk,a physical memory address corresponding to a page table entry of the Tthpage table walk level. The method further comprises performing a memoryaccess operation indicated by the memory access instruction using thephysical memory address.

In another exemplary embodiment, a non-transitory computer-readablemedium is provided, the computer-readable medium having stored thereoncomputer-executable instructions which, when executed by a processor,cause the processor to receive a memory access instruction comprising avirtual memory address. The computer-executable instructions furthercause the processor to determine, based on the memory accessinstruction, a number T of page table walk levels to traverse, wherein Tis greater than zero (0) and less than or equal to a number of pagetable walk levels required to fully translate the virtual memoryaddress. The computer-executable instructions also cause the processorto perform, based on the virtual memory address, a page table walk of Tpage table walk levels of a multilevel page table. Thecomputer-executable instructions additionally cause the processor toidentify, based on the page table walk, a physical memory addresscorresponding to a page table entry of the Tth page table walk level.The computer-executable instructions further cause the processor toperform a memory access operation indicated by the memory accessinstruction using the physical memory address.

Those skilled in the art will appreciate the scope of the presentdisclosure and realize additional embodiments thereof after reading thefollowing detailed description of the preferred embodiments inassociation with the accompanying drawing figures.

BRIEF DESCRIPTION OF THE DRAWING FIGURES

The accompanying drawing figures incorporated in and forming a part ofthis specification illustrate several embodiments of the disclosure, andtogether with the description serve to explain the principles of thedisclosure.

FIG. 1 is a schematic diagram of an exemplary processor-based devicethat includes a processing element (PE) comprising a page table walkercircuit configured to optimize access to page table entries;

FIG. 2 is a block diagram illustrating an exemplary traversal of themultilevel page table of FIG. 1 and its constituent page tables foroptimizing access to page table entries;

FIGS. 3A-3C are block diagrams illustrating an exemplary memory loadinstruction, an exemplary memory store instruction, and an exemplarymemory read/modify/write instruction, respectively, corresponding to thememory access instruction of FIG. 1;

FIGS. 4A and 4B are flowcharts illustrating exemplary operations foroptimizing access to page table entries; and

FIG. 5 is a block diagram of an exemplary processor-based device, suchas the processor-based device of FIG. 1, that is configured to optimizeaccess to page table entries.

DETAILED DESCRIPTION

Exemplary embodiments disclosed herein include optimizing access to pagetable entries in processor-based devices. In one exemplary embodiment,an instruction decode stage of an execution pipeline of aprocessor-based device receives a memory access instruction (e.g., amemory load instruction, a memory store instruction, or a memoryread/modify/write instruction, as non-limiting examples) that includes avirtual memory address. A page table walker circuit of theprocessor-based device determines, based on the memory accessinstruction, a number T of page table walk levels to traverse, where Tis greater than zero (0) and less than or equal to a number of pagetable walk levels required to fully translate the virtual memoryaddress. In some embodiments, the memory access instruction may providea traverse indicator that explicitly specifies the number T of pagetable walk levels to traverse, while some embodiments may provide thatthe number T of page table walk levels to traverse may be determinedbased on a count of recursive traversals indicated by the virtual memoryaddress. The page table walker next performs a page table walk of T pagetable walk levels of the multilevel page table, and identifies aphysical memory address corresponding to a page table entry of theT^(th) page table walk level. The processor-based device then performs amemory access operation indicated by the memory access instruction usingthe physical memory address.

In this regard, FIG. 1 illustrates an exemplary processor-based device100 that provides a processing element (PE) 102 for processingexecutable instructions. The PE 102 may comprise a central processingunit (CPU) having one or more processor cores, or may comprise anindividual processor core comprising a logical execution unit andassociated caches and functional units. The PE 102 of FIG. 1 includes anexecution pipeline 104 that is configured to execute an instructionstream comprising computer-executable instructions. In the example ofFIG. 1, the execution pipeline 104 includes an instruction fetch stage106 for retrieving instructions for execution, an instruction decodestage 108 for translating fetched instructions into control signals forinstruction execution, and an instruction execute stage 110 for actuallyperforming instruction execution. It is to be understood that someembodiments of the processor-based device 100 may comprise multiple PEs102 rather than the single PE 102 shown in the example of FIG. 1, andfurther that some embodiments of the PE 102 may include fewer or morestages within the execution pipeline 104 than those illustrated in theexample of FIG. 1.

The PE 102 of FIG. 1 is communicatively coupled to a system memory 112,which stores a multilevel page table 114 comprising a plurality of pagetables 116(0)-116(P) for use in virtual-to-physical address translation.The PE 102 of FIG. 1 also includes a page table walker circuit 118 thatembodies logic for performing page table walks on the multilevel pagetable 114 to translate virtual memory addresses into physical memoryaddresses. Some embodiments of the PE 102 further include a translationlookaside buffer (TLB) 120 for caching recent translations of virtualmemory addresses to physical memory addresses for subsequent reuse. Thestructure and functionality of the multilevel page table 114 and thepage table walker circuit 118 for performing virtual-to-physical addresstranslation is discussed in greater detail below with respect to FIG. 2.

The processor-based device 100 of FIG. 1 and the constituent elementsthereof may encompass any one of known digital logic elements,semiconductor circuits, processing cores, and/or memory structures,among other elements, or combinations thereof. Embodiments describedherein are not restricted to any particular arrangement of elements, andthe disclosed techniques may be easily extended to various structuresand layouts on semiconductor sockets or packages. It is to be understoodthat some embodiments of the processor-based device 100 may includeelements in addition to those illustrated in FIG. 1. For example, the PE102 may further include one or more instruction caches, unified caches,memory controllers, interconnect buses, and/or additional memorydevices, caches, and/or controller circuits.

As discussed above, circumstances may arise in which it is desirable toallow a software process being executed by the PE 102 to modify thecontents of the page tables 116(0)-116(P) of the multilevel page table114. As non-limiting examples, the software process may need to update aphysical memory address stored in a page table entry of one of the pagetables 116(0)-116(P), or may need to modify access permissions on acorresponding memory page. To modify a page table entry, a physicalmemory address of the page table entry itself within the system memory112 must first be determined. Existing solutions for enabling softwareprocesses to access page table entries may involve recursively mappingeach of the page tables 116(0)-116(P) of the multilevel page table 114,such that the last page table entry (i.e., the page table entry with thehighest index associated with the virtual memory address) of each of thepage tables 116(0)-116(P) stores a pointer to that page table116(0)-116(P). However, recursive mapping requires a dedicated pagetable entry in each of the page tables 116(0)-116(P) of the multilevelpage table 114, which reduces the number of page table entries availablefor address translation. Additionally, while recursive page tableaccesses performed by the page table walker circuit 118 may be cached inthe TLB 120, the cached recursive mappings are usable only forsubsequent recursive mappings, which can result in decreased efficiency.

In this regard, the PE 102 is configured to provide optimized access topage table entries in processor-based devices. In an exemplaryembodiment, the instruction decode stage 108 of the execution pipeline104 receives a memory access instruction 122. The memory accessinstruction 122 includes a virtual memory address 124, and, in someembodiments, may be a memory load instruction, a memory storeinstruction, or a memory read/modify/write instruction, as non-limitingexamples. The page table walker circuit 118 determines, based on thememory access instruction 122, a number T of page table walk levels totraverse, where T is greater than zero (0) and less than or equal to anumber of page table walk levels required to fully translate the virtualmemory address 124. In some embodiments, the memory access instruction122 may provide a traverse indicator 126 that explicitly specifies thenumber T of page table walk levels to traverse. The traverse indicator126 may comprise an immediate value operand, or may comprise a registeroperand indicating a register that stores the number T of page tablewalk levels to traverse. According to some embodiments, the number T ofpage table walk levels to traverse may be determined automatically basedon a count of recursive traversals indicated by the virtual memoryaddress 124.

The page table walker circuit 118 then performs a page table walk of Tpage table walk levels of the multilevel page table 114, and identifiesa physical memory address corresponding to a page table entry of the Tthpage table walk level. The PE 102 performs a memory access operationindicated by the memory access instruction 122 using the physical memoryaddress returned by the page table walker circuit 118 (e.g., byexecuting the memory access instruction 122 using the instructionexecute stage 110). In embodiments in which the memory accessinstruction 122 is a memory load instruction or a memoryread/modify/write instruction, performing the memory access operationmay include returning a content of a memory location indicated by thephysical memory address. Embodiments in which the memory accessinstruction 122 is a memory store instruction or a memoryread/modify/write instruction may provide that performing the memoryaccess operation includes writing store data to a memory locationindicated by the physical memory address.

Some embodiments of the PE 102 in which the number T of page table walklevels to traverse may be determined automatically may further providean optimization selection indicator 128 to enable selective activationof the optimized page table access feature described herein. In suchembodiments, after receiving the memory access instruction 122 by theinstruction decode stage 108 of the execution pipeline 104, the PE 102determines whether the optimization selection indicator 128 is in a setstate. If so, the operations described above for determining the numberT of page table walk levels to traverse and performing the page tablewalk of T page table walk levels of the multilevel page table 114 arecarried out. If the optimization selection indicator 128 is not in a setstate, the page table walker circuit 118 performs a page table walk inconventional fashion.

To provide a more detailed description of the structure andfunctionality of the multilevel page table 114 both in conventional useand in providing optimized access to page table entries, FIG. 2 isprovided. As seen in FIG. 2, the virtual memory address 124 of FIG. 1 isbeing used to traverse the page tables 116(0)-116(3) (i.e., the pagetables 116(0)-116(P) where P=3, in this example) of the multilevel pagetable 114 of FIG. 1. In the example of FIG. 2, the virtual memoryaddress 124 comprises 48 bits that are relevant for virtual memoryaddress translation. The virtual memory address 124 is divided into four(4) bit sets 200, 202, 204, and 206 of nine (9) bits each, and one (1)bit set 208 comprising the lowest 12 bits of the virtual memory address124. Each of the bits sets 200, 202, 204, and 206 are used as indicesinto the corresponding page tables 116(0)-116(3), while the bit set 208is used as an offset into a memory page 210 containing the memorylocation 212 that ultimately corresponds to the virtual memory address124.

In conventional operation, the page table walker circuit 118 performs apage table walk that traverses four (4) page table walk levels totranslate the virtual memory address 124 into a corresponding physicalmemory address. First, the page table walker circuit 118 retrieves abase address 214 indicating the physical memory address of the pagetable 116(0). The base address 214 is then added to the value of the bitset 200 of the virtual memory address 124 to generate a physical memoryaddress of the page table entry 216 of the page table 116(0). This isconsidered the first page table walk level traversed by the page tablewalker circuit 118.

Once the physical memory address of the page table entry 216 isdetermined, the page table walker circuit 118 accesses the physicalmemory address 218 stored in the page table entry 216, which points tothe next page table 116(1) in the multilevel page table 114. Thephysical memory address 218 is then added to the value of the bit set202 of the virtual memory address 124 to generate a physical memoryaddress of the page table entry 220 of the page table 116(1). Theseoperations constitute the second page table walk level traversed by thepage table walker circuit 118. The page table walk continues in similarfashion, with the third page table walk level using the physical memoryaddress 222 stored in the page table entry 220 and the bit set 204 togenerate the physical memory address of the page table entry 224 of thepage table 116(2), and the fourth page table walk level using thephysical memory address 226 stored in the page table entry 226 and thebit set 206 to generate the physical memory address of the page tableentry 228 of the page table 116(3). Finally, the page table walkercircuit 118 uses the physical memory address 230 stored in the pagetable entry 228, in combination with the bit set 208 of the virtualmemory address 124, to generate a physical memory address thatrepresents the translation of the virtual memory address 124, and thatpoints to the memory location 212 in the memory page 210.

In the conventional example described above, the page table walkercircuit 118 performs a page table walk of four (4) page table walklevels to translate the virtual memory address 124 into the physicalmemory address of the memory location 212. However, embodiments of thePE 102 of FIG. 1 for optimizing access to page table entries allows fora fewer number of page table walk levels to be performed, which enablessoftware processes to obtain physical memory addresses of the page tableentries of the page tables 116(0)-116(P) of the multilevel page table114 used to perform translation of the virtual memory address 124. Forexample, to obtain the physical memory address of the page table entry220 of the page table 116(1), only two (2) page table walk levels needto be traversed: one to use the base address 214 and the bit set 200 todetermine the physical memory address of the page table entry 216 of thepage table 116(0), and one to use the physical memory address 218 storedin the page table entry 216 along with the bit set 202 to determine thephysical memory address of the page table entry 220. Accordingly,executing the memory access instruction 122 of FIG. 1 while specifyingtwo (2) page table walk levels (e.g., explicitly using the traverseindicator 126, or implicitly by the virtual memory address 124indicating recursive traversals) results in the physical memory addressof the page table entry 220 being used for the memory access operation.

As noted above, the memory access instruction 122 of FIG. 1 may be amemory load instruction, a memory store instruction, or a memoryread/modify/write instruction, as non-limiting examples. In this regard,FIGS. 3A-3C illustrate an exemplary memory load instruction, anexemplary memory store instruction, and an exemplary read/modify/writeinstruction, respectively, corresponding to the memory accessinstruction 122 of FIG. 1. In FIG. 3A, a memory load instruction 300,corresponding to the memory access instruction 122 of FIG. 1 in someembodiments, includes a virtual memory address 302 corresponding to thevirtual memory address 124 of FIG. 1. In some embodiments, the memoryload instruction 300 may also include a traverse indicator 304corresponding in functionality to the traverse indicator 126 of FIG. 1.

Similarly, in FIG. 3B, a memory store instruction 306, which accordingto some embodiments may correspond to the memory access instruction 122of FIG. 1, provides a virtual memory address 308 and a traverseindicator 310 corresponding to the virtual memory address 124 and thetraverse indicator 126, respectively, of FIG. 1. The memory storeinstruction 306 further includes store data 312, representing data to bewritten to the memory location indicated by the physical memory addressresulting from the page table walk of the T page table walk levels ofthe multilevel page table 114 described above with respect to FIGS. 1and 2. In some embodiments, the store data 312 may comprise an immediatevalue to be written to the memory location, or may comprise a registeroperand indicating a register that stores data to be written to thememory location.

FIG. 3C illustrates a memory read/modify/write instruction 312, whichmay correspond to the memory access instruction 122 of FIG. 1 in someembodiments. Like the memory store instruction 306, the memoryread/modify/write instruction 312 provides a virtual memory address 314and a traverse indicator 316 corresponding to the virtual memory address124 and the traverse indicator 126, respectively, of FIG. 1. The memoryread/modify/write instruction 312 also includes store data 318, whichrepresents data to be written to the memory location indicated by thephysical memory address resulting from the page table walk of the T pagetable walk levels of the multilevel page table 114 described above withrespect to FIGS. 1 and 2. The store data 318 according to someembodiments may comprise an immediate value to be written to the memorylocation, or may comprise a register operand indicating a register thatstores data to be written to the memory location.

It is to be understood that the memory load instruction 300, the memorystore instruction 306, and the memory read/modify/write instruction 312in some embodiments may each be implemented within the PE 102 asdedicated instructions with unique opcodes provided by an instructionset architecture (ISA) of the PE 102. Alternatively or additionally, thememory load instruction 300, the memory store instruction 306, and/orthe memory read/modify/write instruction 312 may be conventional memoryaccess instructions to which additional operands and/or opcode bits areadded to accomplish the functionality described herein.

FIGS. 4A and 4B illustrate exemplary operations 400 for optimizingaccess to page table entries by the processor-based device 100 ofFIG. 1. For the sake of clarity, elements of FIGS. 1 and 2 arereferenced in describing FIGS. 4A and 4B. The operations 400 in FIG. 4A,according to some embodiments, begin with the instruction decode stage108 of the execution pipeline 104 of the PE 102 of the processor-baseddevice 100 receiving the memory access instruction 122 comprising thevirtual memory address 124 (block 402). In embodiments in which the PE102 provides the optimization selection indicator 128, the page tablewalker circuit 118 may determine whether the optimization selectionindicator 128 of the PE 102 is in a set state (block 404). If not, thepage table walker circuit 118 performs a conventional page table walk(block 406).

However, if the PE 102 determines at decision block 404 that theoptimization selection indicator 128 is in a set state, or if the PE 102does not provide the optimization selection indicator 128, the pagetable walker circuit 118 of the PE 102 determines, based on the memoryaccess instruction 122, the number T of page table walk levels totraverse, wherein T is greater than zero (0) and less than or equal to anumber of page table walk levels required to fully translate the virtualmemory address 124 (block 408). In some embodiments, the operations ofblock 408 for determining the number T of page table walk levels totraverse may be based on the traverse indicator 126 (block 410). Someembodiments may provide that the operations of block 408 for determiningthe number T of page table walk levels to traverse may be based on acount of one or more recursive traversals indicated by the virtualmemory address 124 (block 412). Processing then resumes at block 414 ofFIG. 4B.

Referring now to FIG. 4B, the page table walker circuit 118 of the PE102 next performs, based on the virtual memory address 124, a page tablewalk of T page table walk levels of the multilevel page table 114 (block414). According to some embodiments, the TLB 120 may cache the pagetable walk of the T page table walk levels of the multilevel page table114 (block 416). In such embodiments, the operations of block 414 forperforming the page table walk of the T page table walk levels of themultilevel page table 114 may involve accessing a previously cached pagetable walk in the TLB 120 in response to a hit on the TLB 120. The pagetable walker circuit 118 then identifies, based on the page table walk,a physical memory address corresponding to a page table entry (such asthe page table entry 220 of FIG. 2) of the Tth page table walk level(block 418). The PE 102 performs a memory access operation indicated bythe memory access instruction 122 using the physical memory address(block 420). In some embodiments in which the memory access instruction122 is the memory load instruction 300 of FIG. 3A or the memoryread/modify/write instruction 312 of FIG. 3C, the operations of block420 for performing the memory access operation may include returning acontent of a memory location indicated by the physical memory address(block 422). Some embodiments in which the memory access instruction 122is the memory store instruction 306 of FIG. 3B or the memoryread/modify/write instruction 312 of FIG. 3C may provide that theoperations of block 420 for performing the memory access operation mayinclude writing the store data 312 to a memory location indicated by thephysical memory address (block 424).

FIG. 5 is a block diagram of an exemplary processor-based device 500,such as the processor-based device 100 of FIG. 1, that providesoptimized access to page table entries. The processor-based device 500may be a circuit or circuits included in an electronic board card, suchas a printed circuit board (PCB), a server, a personal computer, adesktop computer, a laptop computer, a personal digital assistant (PDA),a computing pad, a mobile device, or any other device, and mayrepresent, for example, a server or a user's computer. In this example,the processor-based device 500 includes a processor 502. The processor502 represents one or more general-purpose processing circuits, such asa microprocessor, central processing unit, or the like, and maycorrespond to the PE 102 of FIG. 1. The processor 502 is configured toexecute processing logic in instructions for performing the operationsand steps discussed herein. In this example, the processor 502 includesan instruction cache 504 for temporary, fast access memory storage ofinstructions and an instruction processing circuit 510. Fetched orprefetched instructions from a memory, such as from a system memory 508over a system bus 506, are stored in the instruction cache 504. Theinstruction processing circuit 510 is configured to process instructionsfetched into the instruction cache 504 and process the instructions forexecution.

The processor 502 and the system memory 508 are coupled to the systembus 506 and can intercouple peripheral devices included in theprocessor-based device 500. As is well known, the processor 502communicates with these other devices by exchanging address, control,and data information over the system bus 506. For example, the processor502 can communicate bus transaction requests to a memory controller 512in the system memory 508 as an example of a peripheral device. Althoughnot illustrated in FIG. 5, multiple system buses 506 could be provided,wherein each system bus constitutes a different fabric. In this example,the memory controller 512 is configured to provide memory accessrequests to a memory array 514 in the system memory 508. The memoryarray 514 is comprised of an array of storage bit cells for storingdata. The system memory 508 may be a read-only memory (ROM), flashmemory, dynamic random access memory (DRAM), such as synchronous DRAM(SDRAM), etc., and a static memory (e.g., flash memory, static randomaccess memory (SRAM), etc.), as non-limiting examples.

Other devices can be connected to the system bus 506. As illustrated inFIG. 5, these devices can include the system memory 508, one or moreinput devices 516, one or more output devices 518, a modem 524, and oneor more display controllers 520, as examples. The input device(s) 516can include any type of input device, including, but not limited to,input keys, switches, voice processors, etc. The output device(s) 518can include any type of output device, including, but not limited to,audio, video, other visual indicators, etc. The modem 524 can be anydevice configured to allow exchange of data to and from a network 526.The network 526 can be any type of network, including, but not limitedto, a wired or wireless network, a private or public network, a localarea network (LAN), a wireless local area network (WLAN), a wide areanetwork (WAN), a BLUETOOTH™ network, and the Internet. The modem 524 canbe configured to support any type of communications protocol desired.The processor 502 may also be configured to access the displaycontroller(s) 520 over the system bus 506 to control information sent toone or more displays 522. The display(s) 522 can include any type ofdisplay, including, but not limited to, a cathode ray tube (CRT), aliquid crystal display (LCD), a plasma display, etc.

The processor-based device 500 in FIG. 5 may include a set ofinstructions 528 that may be encoded with the reach-based explicitconsumer naming model to be executed by the processor 502 for anyapplication desired according to the instructions. The instructions 528may be stored in the system memory 508, processor 502, and/orinstruction cache 504 as examples of non-transitory computer-readablemedium 530. The instructions 528 may also reside, completely or at leastpartially, within the system memory 508 and/or within the processor 502during their execution. The instructions 528 may further be transmittedor received over the network 526 via the modem 524, such that thenetwork 526 includes the computer-readable medium 530.

While the computer-readable medium 530 is shown in an exemplaryembodiment to be a single medium, the term “computer-readable medium”should be taken to include a single medium or multiple media (e.g., acentralized or distributed database, and/or associated caches andservers) that store the one or more sets of instructions 528. The term“computer-readable medium” shall also be taken to include any mediumthat is capable of storing, encoding, or carrying a set of instructionsfor execution by a processing device and that cause the processingdevice to perform any one or more of the methodologies of theembodiments disclosed herein. The term “computer-readable medium” shallaccordingly be taken to include, but not be limited to, solid-statememories, optical medium, and magnetic medium.

The embodiments disclosed herein include various steps. The steps of theembodiments disclosed herein may be formed by hardware components or maybe embodied in machine-executable instructions, which may be used tocause a general-purpose or special-purpose processor programmed with theinstructions to perform the steps. Alternatively, the steps may beperformed by a combination of hardware and software process.

The embodiments disclosed herein may be provided as a computer programproduct, or software process, that may include a machine-readable medium(or computer-readable medium) having stored thereon instructions, whichmay be used to program a computer system (or other electronic devices)to perform a process according to the embodiments disclosed herein. Amachine-readable medium includes any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer). For example, a machine-readable medium includes: amachine-readable storage medium (e.g., ROM, random access memory(“RAM”), a magnetic disk storage medium, an optical storage medium,flash memory devices, etc.), and the like.

Unless specifically stated otherwise and as apparent from the previousdiscussion, it is appreciated that throughout the description,discussions utilizing terms such as “processing,” “computing,”“determining,” “displaying,” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data and memories represented asphysical (electronic) quantities within the computer system's registersinto other data similarly represented as physical quantities within thecomputer system memories or registers or other such information storage,transmission, or display devices.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various systems may beused with programs in accordance with the teachings herein, or it mayprove convenient to construct more specialized apparatuses to performthe required method steps. The required structure for a variety of thesesystems will appear from the description above. In addition, theembodiments described herein are not described with reference to anyparticular programming language. It will be appreciated that a varietyof programming languages may be used to implement the teachings of theembodiments as described herein.

Those of skill in the art will further appreciate that the variousillustrative logical blocks, modules, circuits, and algorithms describedin connection with the embodiments disclosed herein may be implementedas electronic hardware, instructions stored in memory or in anothercomputer-readable medium and executed by a processor or other processingdevice, or combinations of both. The components of the distributedantenna systems described herein may be employed in any circuit,hardware component, integrated circuit (IC), or IC chip, as examples.Memory disclosed herein may be any type and size of memory and may beconfigured to store any type of information desired. To clearlyillustrate this interchangeability, various illustrative components,blocks, modules, circuits, and steps have been described above generallyin terms of their functionality. How such functionality is implementeddepends on the particular application, design choices, and/or designconstraints imposed on the overall system. Skilled artisans mayimplement the described functionality in varying ways for eachparticular application, but such implementation decisions should not beinterpreted as causing a departure from the scope of the presentembodiments.

The various illustrative logical blocks, modules, and circuits describedin connection with the embodiments disclosed herein may be implementedor performed with a processor, a Digital Signal Processor (DSP), anApplication Specific Integrated Circuit (ASIC), a Field ProgrammableGate Array (FPGA), or other programmable logic device, a discrete gateor transistor logic, discrete hardware components, or any combinationthereof designed to perform the functions described herein. Furthermore,a controller may be a processor. A processor may be a microprocessor,but in the alternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices (e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration).

The embodiments disclosed herein may be embodied in hardware and ininstructions that are stored in hardware, and may reside, for example,in RAM, flash memory, ROM, Electrically Programmable ROM (EPROM),Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk,a removable disk, a CD-ROM, or any other form of computer-readablemedium known in the art. An exemplary storage medium is coupled to theprocessor such that the processor can read information from, and writeinformation to, the storage medium. In the alternative, the storagemedium may be integral to the processor. The processor and the storagemedium may reside in an ASIC. The ASIC may reside in a remote station.In the alternative, the processor and the storage medium may reside asdiscrete components in a remote station, base station, or server.

It is also noted that the operational steps described in any of theexemplary embodiments herein are described to provide examples anddiscussion. The operations described may be performed in numerousdifferent sequences other than the illustrated sequences. Furthermore,operations described in a single operational step may actually beperformed in a number of different steps. Additionally, one or moreoperational steps discussed in the exemplary embodiments may becombined. Those of skill in the art will also understand thatinformation and signals may be represented using any of a variety oftechnologies and techniques. For example, data, instructions, commands,information, signals, bits, symbols, and chips, that may be referencesthroughout the above description, may be represented by voltages,currents, electromagnetic waves, magnetic fields, or particles, opticalfields or particles, or any combination thereof.

Unless otherwise expressly stated, it is in no way intended that anymethod set forth herein be construed as requiring that its steps beperformed in a specific order. Accordingly, where a method claim doesnot actually recite an order to be followed by its steps, or it is nototherwise specifically stated in the claims or descriptions that thesteps are to be limited to a specific order, it is in no way intendedthat any particular order be inferred.

It will be apparent to those skilled in the art that variousmodifications and variations can be made without departing from thespirit or scope of the invention. Since modifications, combinations,sub-combinations and variations of the disclosed embodimentsincorporating the spirit and substance of the invention may occur topersons skilled in the art, the invention should be construed to includeeverything within the scope of the appended claims and theirequivalents.

What is claimed is:
 1. A processor-based device, comprising: a systemmemory comprising a multilevel page table comprising a plurality of pagetables, each page table comprising a plurality of page table entries;and a processing element (PE) comprising: an execution pipelinecomprising an instruction decode stage; and a page table walker circuit;the PE configured to: receive, using the instruction decode stage, amemory access instruction comprising a virtual memory address;determine, using the page table walker circuit based on the memoryaccess instruction, a number T of page table walk levels to traverse,wherein T is greater than zero (0) and less than or equal to a number ofpage table walk levels required to fully translate the virtual memoryaddress; perform, using the page table walker circuit based on thevirtual memory address, a page table walk of T page table walk levels ofthe multilevel page table; identify, based on the page table walk, aphysical memory address corresponding to a page table entry of the Tthpage table walk level; and perform a memory access operation indicatedby the memory access instruction using the physical memory address. 2.The processor-based device of claim 1, wherein: the memory accessinstruction further comprises a traverse indicator that indicates thenumber T of page table walk levels to traverse; and the PE is configuredto determine the number T of page table walk levels to traverse based onthe traverse indicator.
 3. The processor-based device of claim 1,wherein: the multilevel page table is configured to support recursivetraversals; the virtual memory address indicates one or more recursivetraversals of the multilevel page table; and the PE is configured todetermine the number T of page table walk levels to traverse based on acount of the one or more recursive traversals indicated by the virtualmemory address.
 4. The processor-based device of claim 3, wherein: thePE further comprises an optimization selection indicator; and the PE isconfigured to determine the number T of page table walk levels totraverse based on the count of the one or more recursive traversalsindicated by the virtual memory address responsive to the optimizationselection indicator being in a set state.
 5. The processor-based deviceof claim 1, wherein the PE further comprises a translation lookasidebuffer (TLB) configured to cache the page table walk of T page tablewalk levels of the multilevel page table.
 6. The processor-based deviceof claim 1, wherein: the memory access instruction comprises a memoryload instruction; and the PE is configured to perform the memory accessoperation indicated by the memory access instruction using the physicalmemory address by being configured to return a content of a memorylocation indicated by the physical memory address.
 7. Theprocessor-based device of claim 1, wherein: the memory accessinstruction comprises a memory store instruction; the memory storeinstruction further comprises store data; and the PE is configured toperform the memory access operation indicated by the memory accessinstruction using the physical memory address by being configured towrite the store data to a memory location indicated by the physicalmemory address.
 8. The processor-based device of claim 1, wherein: thememory access instruction comprises a memory read/modify/writeinstruction; the memory read/modify/write instruction further comprisesstore data; and the PE is configured to perform the memory accessoperation indicated by the memory access instruction using the physicalmemory address by being configured to: return a content of a memorylocation indicated by the physical memory address; and write the storedata to a memory location indicated by the physical memory address.
 9. Amethod for optimizing access to page table entries, comprising:receiving, by an instruction decode stage of an execution pipeline of aprocessing element (PE) of a processor-based device, a memory accessinstruction comprising a virtual memory address; determining, by a pagetable walker circuit of the PE based on the memory access instruction, anumber T of page table walk levels to traverse, wherein T is greaterthan zero (0) and less than or equal to a number of page table walklevels required to fully translate the virtual memory address;performing, by the page table walker circuit of the PE based on thevirtual memory address, a page table walk of T page table walk levels ofa multilevel page table; identifying, based on the page table walk, aphysical memory address corresponding to a page table entry of the Tthpage table walk level; and performing a memory access operationindicated by the memory access instruction using the physical memoryaddress.
 10. The method of claim 9, wherein: the memory accessinstruction further comprises a traverse indicator that indicates thenumber T of page table walk levels to traverse; and determining thenumber T of page table walk levels to traverse is based on the traverseindicator.
 11. The method of claim 9, wherein: the multilevel page tableis configured to support recursive traversals; the virtual memoryaddress indicates one or more recursive traversals of the multilevelpage table; and determining the number T of page table walk levels totraverse is based on a count of the one or more recursive traversalsindicated by the virtual memory address.
 12. The method of claim 11,wherein determining the number T of page table walk levels to traversebased on the count of the one or more recursive traversals indicated bythe virtual memory address is responsive to an optimization selectionindicator of the PE being in a set state.
 13. The method of claim 9,further comprising caching, by a translation lookaside buffer (TLB) ofthe PE, the page table walk of T page table walk levels of themultilevel page table.
 14. The method of claim 9, wherein: the memoryaccess instruction comprises a memory load instruction; and performingthe memory access operation indicated by the memory access instructionusing the physical memory address comprises returning a content of amemory location indicated by the physical memory address.
 15. The methodof claim 9, wherein: the memory access instruction comprises a memorystore instruction; the memory store instruction further comprises storedata; and performing the memory access operation indicated by the memoryaccess instruction using the physical memory address comprises writingthe store data to a memory location indicated by the physical memoryaddress.
 16. The method of claim 9, wherein: the memory accessinstruction comprises a memory read/modify/write instruction; the memoryread/modify/write instruction further comprises store data; andperforming the memory access operation indicated by the memory accessinstruction using the physical memory address comprises: returning acontent of a memory location indicated by the physical memory address;and writing the store data to a memory location indicated by thephysical memory address.
 17. A non-transitory computer-readable mediumhaving stored thereon computer-executable instructions which, whenexecuted by a processor, cause the processor to: receive a memory accessinstruction comprising a virtual memory address; determine, based on thememory access instruction, a number T of page table walk levels totraverse, wherein T is greater than zero (0) and less than or equal to anumber of page table walk levels required to fully translate the virtualmemory address; perform, based on the virtual memory address, a pagetable walk of T page table walk levels of a multilevel page table;identify, based on the page table walk, a physical memory addresscorresponding to a page table entry of the Tth page table walk level;and perform a memory access operation indicated by the memory accessinstruction using the physical memory address.
 18. The non-transitorycomputer-readable medium of claim 17, wherein: the memory accessinstruction further comprises a traverse indicator that indicates thenumber T of page table walk levels to traverse; and thecomputer-executable instructions cause the processor to determine thenumber T of page table walk levels to traverse based on the traverseindicator.
 19. The non-transitory computer-readable medium of claim 17,wherein: the multilevel page table is configured to support recursivetraversals; the virtual memory address indicates one or more recursivetraversals of the multilevel page table; and the computer-executableinstructions cause the processor to determine the number T of page tablewalk levels to traverse based on a count of the one or more recursivetraversals indicated by the virtual memory address.
 20. Thenon-transitory computer-readable medium of claim 19, wherein thecomputer-executable instructions cause the processor to determine thenumber T of page table walk levels to traverse based on the count of theone or more recursive traversals indicated by the virtual memory addressresponsive to an optimization selection indicator being in a set state.